Evaluation of Interestingness Measures for Ranking Discovered Knowledge

نویسندگان

  • Robert J. Hilderman
  • Howard J. Hamilton
چکیده

When mining a large database the number of patterns dis covered can easily exceed the capabilities of a human user to identify in teresting results To address this problem various techniques have been suggested to reduce and or order the patterns prior to presenting them to the user In this paper our focus is on ranking summaries generated from a single dataset where attributes can be generalized in many dif ferent ways and to many levels of granularity according to taxonomic hierarchies We theoretically and empirically evaluate thirteen diversity measures used as heuristic measures of interestingness for ranking sum maries generated from databases The thirteen diversity measures have previously been utilized in various disciplines such as information the ory statistics ecology and economics We describe ve principles that any measure must satisfy to be considered useful for ranking summaries Theoretical results show that only four of the thirteen diversity measures satisfy all of the principles We then analyze the distribution of the index values generated by each of the thirteen diversity measures Empirical re sults obtained using synthetic data show that the distribution of index values generated tend to be highly skewed about the mean median and middle index values The objective of this work is to gain some insight into the behaviour that can be expected from each of the measures in practice

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ranking the Interestingness of Summaries from Data Mining Systems

We study data rn~rdng where the task is description by summarization, the representation language is generalized relations, the evaluation criteria are based on heuristic measures of interestingness, and the method for searching is the Multi-Attribute Generalization algorithm for domain generalization graphs. We present and empirically compare four heuristics for ranking the interestingness of ...

متن کامل

Applying Objective Interestingness Measures in Data Mining Systems

One of the most important steps in any knowledge discovery task is the interpretation and evaluation of discovered patterns. To address this problem, various techniques, such as the chi-square test for independence, have been suggested to reduce the number of patterns presented to the user and to focus attention on those that are truly statistically signiicant. However, when mining a large data...

متن کامل

Knowledge Discovery and Interestingness Measures: A Survey

Knowledge discovery in databases, also known as data mining, is the efficient discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analyzed and the form of knowledge representation used to convey the discovered knowledge. An importan...

متن کامل

Heuristic for Ranking the Interestigness of Discovered Knowledge

We describe heuristics, based upon information theory and statistics, for ranking the interestingness of summaries generated from databases. The tuples in a summary are unique, and therefore, can be considered to be a population described by some probability distribution. The four interestingness measures presented here are based upon common measures of diversity of a population: variance, the ...

متن کامل

Assessing the Interestingness of Discovered Knowledge Using a Principled Objective Approach

When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be gener...

متن کامل

Principles for mining summaries using objective measures of interestingness

An important problem in the area of data mining is the development of eeective measures of interestingness for ranking discovered knowledge. In this paper, we propose ve principles that any measure must satisfy to be considered useful for ranking the interestingness of summaries generated from databases. We investigate the problem within the context of summarizing a single dataset which can be ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001